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Abstract 

In [P90] we proposed to employ Vandermonde and Hankel multipliers to transform into each 
other the matrix structures of Toeplitz, Hankel, Vandermonde and Cauchy types as a means of 
extending any successful algorithm for the inversion of matrices having one of these structures 
to inverting the matrices with the structures of the three other types. Surprising power of 
this approach has been demonstrated in a number of works, which culminated in ingeneous 
numerically stable algorithms that approximated the solution of a nonsingular Toeplitz linear 
system in nearly linear (versus previuosly cubic) arithmetic time. We first revisit this powerful 
method, covering it comprehensively, and then specialize it to yield a similar acceleration of 
the known algorithms for computations with matrices having structures of Vandermonde or 
Cauchy types. In particular we arrive at numerically stable approximate multipoint polynomial 
evaluation and interpolation in nearly linear time, by using 0(fonlog'' n) flops where h — 1 for 
evaluation, h = 2 for interpolation, and 2"' is the relative norm of the approximation errors. 

Keywords: Transforms of matrix structures, Vandermonde matrices, Cauchy matrices, Multipole 
method, HSS matrices. Polynomials, Rational functions. Multipoint evaluation. Interpolation 
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1 Introduction 



Table [T] displays four classes of most popular structured matrices, virhich are omnipresent in modern 
computations for Sciences, Engineering, and Signal and Image Processing and which have been 
naturally extended to larger classes of matrices, T, V, and C, having structures of Toeplitz, 
Hankel, Vandermonde and Cauchy types, respectively. Such matrices can be readily expressed via 
their displacements of small ranks, which implies a number of their further attractive properties: 

• Compact compressed representation through a small number of parameters, typically 0{n) 
parameters in the case of n x n matrices 

• Simple expressions for the inverse through the solutions of a small number of linear systems 
of equations wherever the matrix is invertible 

• Multiplication by vectors in nearly linear arithmetic time 

• Solution of nonsingular linear systems of equations with these matrices in quadratic or nearly 
linear arithmetic time 
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Table 1: Four classes of structured matrices 



Toeplitz matrices T — (ij-j)"^^]^ 
/ to t^i ■ ■ ■ 



to 



\tn-l 



tl to ) 



Vandermonde matrices V = V^= ( S; | 
1 S2 ••• s"" 



Hankel matrices H = (^j+j)"j_Q 
f ho hi ■■■ hn-i \ 
hi /i2 ■■ hn 



\hn-l hr. 



Cauchy matrices C = Cs.t = 



Si — tj 



\1 Sn 



si-tl 

1 
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Extensive and highly successful research and implementation work based on these properties 
has been continuing for more than three decades. We follow |P90] and employ structured matrix 
multiplications to transform the four structures into each other. For example, TH = HT = H, 
WK = T, and V'^V is a Hankel matrix. The paper |P9Q) showed that this technique enables one to 
extend any successful algorithm for the inversion of the matrices of any of the four classes T, %, 
V, and C to the matrices of the three other classes. We cover this technique comprehensively and 
simplify its presentation versus |P90| because instead of the the Stein displacements M — AMB in 
[P90| we employ the Sylvester displacements AM — MB and the machinery of operating with them 
from [POO] and [POTI Section 1.5]. 

The proposed structure transforms are simple but have surprising power where the transform 
links matrix classes having distinct features. For example, the matrix structure of Cauchy type 
is invariant in row and column interchange (in contrast to the structures of Toeplitz and Hankel 
types) and enables expansion of the matrix entries into Loran's series (unlike the structures of the 
three other types). Exploiting these distinctions has lead to dramatic acceleration of the known 
numerically stable algorithms for Toeplitz and Toeplitz-like linear systems of equations by means of 
their transformation into Cauchy-like matrices and exploiting the above properties of these matrices. 

Their invariance to row interchange enabled numerically stable solution in quadratic rather than 
cubic ti me in |GK095| . [G98] . [R06] . but the paper |MRT05j (cf. also |CGS07| . |XXG12j . and 
[XXCBj ) has instead exploited the Loran's expansion of the entries of the basic Cauchy matrices 
to obtain their close approximation by HSS matrices. ("HSS" is the acronym for "hierarchically 
semiseparable" .) This structure of a distinct type enabled application of the Multipole/HSS powerful 
techniques, and the resulting numerically stable algorithms approximate the solution of a nonsingular 
Toeplitz linear system of equations in nearly linear (and thus nearly optimal) arithmetic time. 
The intensive work in |XXG12) and jXXCBj on extension, refinement and implementation of the 
algorithms has already made them quite attractive for the users. 

Similar advance has not been achieved, however, for the computations with matrices having 
structures of Vandermonde or Cauchy types. All the cited papers on Toeplitz computations share 
their basic displacement map, which is a specialization of our general class of the transformations 
of matrix structures derived from [P90| (see our comments at the end of Section [53)) . The map 
transforms the matrices with the structure of Toeplitz type into the matrices of the subclasses of 
the class C linked to FFT and defined by the knot sets {si, . . . , Sn,ti, . . . ,tn} equally spaced on 
the unit circle {z : \z\ — 1} of the complex plane. This covers the structures of Toeplitz but not 
Vandermonde and Cauchy types. 
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In our present paper we specify the subclass of CV and CV-like matrices, which are the Cauchy 
and Cauchy-hke matrices, respectively, having at least one (but not necessarily both) of their two 
basic knot sets {si, . . . , s„} or {ti, . . . , t„} equally spaced on the unit circle {z : \z\ = 1}. These 
are precisely the Cauchy and Cauchy-like matrices that have FFT-type structured transforms into 
the matrices of the class V or their transposes. Under this framework our main technical step is 
an extension of the algorithms of IMRTOSj . |CGS07| . and [XXG12 to proving that aU CV and CV- 
like matrices can be closely appoximated by HSS matrices. As soon as such an approximation is 
available, one just needs to apply the Multipole method to the HSS matrices to obtain numerically 
stable approximation algorithms that run in nearly linear time for our tasks for CV matrices, versus 
quadratic time of the known algorithms. By applying the FFT-based structured transforms between 
matrices with the structures of CV and Vandermonde types, we readily extend these results to 
the matrices of the latter class and consequently to the problems of multipoint evaluation and 
interpolation for polynomials. 

The new algorithms approximate within relative error norm bound the product of an n x n 
CV matrix by a vector by using 0{bn log n) flops and the solution of a nonsingular CV linear system 
of n equations by using 0{bn log^ n) flops. FFT-based structured transforms extend these algorithms 
and complexity bounds to computations with Vandermonde matrices and to approximate multipoint 
evaluation and interpolation for polynomials. The resulting nearly linear time bounds are nearly 
optimal, but still seem to be overly pessimistic, in view of the results of the extensive tests in |XXG12j 
for the similar HSS computations (see our Remark [55]) . The cited results are readily extended to 
CV-like matrices and consequently to the matrices having structure of Vandermonde type. 

Various extensions, ameliorations, refinements, and nontrivial specializations of the proposed 
methods can be interesting. Most valuable would be new transforms among various new classes of 
structured matrices, with significant algorithmic applications. At the end of Section [5] we sketch a 
natural extension of our techniques to the general class of Cauchy and Cauchy-like matrices, but 
indicate that this generally complicates the control over the output errors. It can be interesting 
that even a very crude variant of our techniques (which proceeds with a limited use of the HSS 
algorithms) still accelerates the known numerical algorithms for multipoint polynomial evaluation 
by a factor of y^n/ \ogn (see Remark [34]) . 

For a sample further application, recall that the current best package of subroutines for poly- 
nomial root-finding, MPSolve, is reduced essentially to recursive application of the Ehrlich-Aberth 
algorithm, and consequently to recursive numerical multipoint polynomial evaluation. For this task 
MPSolve uses a quadratic time algorithm, which is the users' current choice. So our present ac- 
celeration from quadratic to a nearly linear time can be translated into the same acceleration of 
MPSolve. 

We organize our pesentation as follows. After recalling some definitions and basic facts on general 
matrices and on four classes of structured matrices T, H, V, and C in the next three sections, we cover 
in some detail the transformations of matrix structures among these classes in Section [5l recall the 
class of HSS matrices in Section [6l estimate numerical ranks of Cauchy and Cauchy-like matrices of 
a large class in Section [71 and extend these estimates to compute the HSS approximations of these 
matrices in Section |S] and to approximate the products of these matrices and their inverses by a 
vector in Section [9l We conclude the paper with Section [TOl 

For simplicity we assume square structured matrices throughout, but our study can be readily 
extended to the case of rectangular matrices. 

2 Some definitions and basic facts 

Hereafter "flop" stands for "arithmetic operation"; the concepts "large", "small", "near", "close", 
"approximate" , "ill conditioned" and "well conditioned" are quantified in the context. Next we recall 
and extend some basic definitions and facts on computations with general and structured matrices 
(cf. [GL96] . [S98] . [POl] l. 
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2.1 General matrices 



M — ("^ij)™=i is an m X n matrix, M'^ and are its transpose and Hermitian (complex 

conjugate) transpose, respectively. We write M^-^ for {M^)~^ = {M~^)^ . 

{Bi I ... I Bn) denotes a 1 x n block matrix with the blocks . . . , i3„. diag(i3i, . . . , = 
diag(i3j)"^]^ is an n x n block diagonal matrix with the diagonal blocks Si, ... , i?„. In the case 
of scalar blocks si, . . . , s„ we arrive at a vector s = (sj)"^i and an n x n diagonal matrix Ds = 
diag(s) = diag(sj)"^^ with the diagonal entries si, . . . , s„. 

The n coordinate vectors ei,...,e„ of a dimension n form the nx n identity matrix /„ = 
(ei I ... I e„) and the n x n reflection matrix J„ = (e„ | ... | ei). Jn = ^ ■ We write / 
and J where the matrix size is not important or is defined by context. 

Preprocessors. For three nonsingular matrices P, M, and N and a vector b, the equations 

= N{PMN)-^P, PMNy = Ph, x = Ny (1) 

reduce the inversion of the matrix M and the solution of a linear system of equations Afx = b to 
the inversion of the product PMN and the solution of the linear system PMNy = Ph, respectively. 
For some important classes of matrices M this preprocessing can simplify dramatically the inversion 
of a matrix and the solution of a linear system of equations. 

Generators. Given an m x n matrix M of a rank r and an integer / > r, we have a nonunique 
expression M = FG^ for pairs {F, G) of matrices of sizes m x I and n x respectively. We call such 
a pair [F, G) a generator of length I for the matrix M, which is the shortest for I — r. 

Theorem 1. (Cf. JBASOf . fMSS, WMl, fGE96f . fPOli Section 4.6.2].) Given a generator of a 
length I for an n x n matrix M having a rank r, r < I < n, it is sufficient to use 0(J?n) flops to 
compute a generator of length r for the matrix M . 

Norm, conditioning, orthogonality, numerical rank, a perturbation norm bound. 

||M|| = ||M^|| = ||M||2 is the (Euclidean) 2-norm of a matrix M. 

For a fixed tolerance r the minimum rank of matrices in the r-neighborhood of a matrix M is 
said to be its T-rank. The numerical rank of a matrix is its r-rank for a small positive r. A matrix 
is called ill conditioned if it has a close neighbor of a smaller rank or eqiuivalently if its rank exceeds 
its numerical rank. Otherwise it is called well conditioned. If a matrix M is ill conditioned, one 
must compute its inverse and the solution of a linear system Mx = f with a high precision to ensure 
meaningful output for these problems, but not for multiplication by a vector. 

A matrix M is unitary or orthogonal if M — I or MM^ = I. It is quasiunitary if cM is 
unitary for a constant c. Such a matrix U has full rank and is very well conditioned: its distance to 
the closest matrix of a smaller rank is equal to \\U\\ — 1. 

Theorem 2. (See fSM Corollary 1.4.19] for P = ~M-^E.) Suppose M and M + E are two 
nonsingular matrices of the same size and \\M^^E\\ — 6 < 1. Then \\I — {Ad + E)^^M\\ < and 
II I (M + £:)-! -M-i|| < j4e||M-i||. In particular \\\{AI + E)-^ - < 0.5||M-i|| if 6 < 1/3. 

2.2 The classes of Toeplitz, Hankel, Vandermonde and Cauchy matri- 
ces, some subclasses and factorizations, polynomial evaluation and 
interpolation 

For larger integers n the entries of an n x n Vandermonde matrix Vg vary in magnitude greatly 
unless I Si I ~ 1 for all i, as is the case with the Vandermonde matrices and fl^ below, which are 
unitary up to scaling by 

DFT and DFT-based matrices. (See |BP94| Sections 1.2, 3.4].) Write w„ = exp(^v^) to 
denote a primitive nth root of 1. Its powers l,u;„, . . . ,a;^~^ are equally spaced on the unit circle 
{z : \z\ — 1}. Let V, = iln — {^l{)i'jlo denote the nxn matrix of DFT, that is of the discrete Fourier 
transform at n points. D, and fl^ are quasiunitary, whereas and -^fi^ and — -^i^^ are 

unitary matrices, because ftn^ = nl. See, e.g., |BP94[ Sections 1.2 and 3.4] on a proof of the 
following theorem and on the numerical stability of the supporting algorithms. 
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Theorem 3. For any vector v = (wi)"^^ ""^^ '^'^'^ compute the vectors flv and by using 

0{n\ogn) flops. If n = 2^ is a power of 2, then one can compute the vectors ilv and il^^v hy using 
0.5nlog2 n and O.Snlogj n + n flops, respectively. 



Cauchy and Vandermonde matrices and polynomial evaluation and interpolation. 

(See Table [Hand [POTI Chapters 2 and 3].) It holds that 

Cs,t = (2) 

C.,t = diag(i(s.)-^)r=iV;i4-i diag(i'(i,))Li (3) 

where s = (sj)f^i, t = (t,)f^i, and = ^ 

Equation ([3]) expresses a Cauchy matrix Cs,t through the Vandermonde matrix T4, the inverse 
V^^ of the Vandermonde matrix Vt, the coefHcients of the auxiliary polynomial t{x) defined by its 
roots to, ■ ■ ■ ,iri-i, and the values of this polynomial and its derivative t'{x)^ each at n points. Part 

(i) of the following simple theorem states that polynomial multipoint evaluation and interpolation 
with the knots si,...,Sn are equivalent to multiplication of the Vandermonde matrix 14 by the 
coefficient vector of the polynomial and the solution of the associated linear system of equations with 
this matrix, respectively. Part (ii) of the theorem shows shows equivalence of rational multipoint 
evaluation and interpolation to the similar equations for Cauchy (rather than Vandermonde) matrix. 
Part (iii) of the theorem shows that the reconstruction of the polynomial coefficients from the roots 
can be reduced to polynomial interpolation and consequently to solving a Vandermonde linear system 
of equations, on these links and similar links of rational multipoint evaluation and interpolation to 
Cauchy matrices. Together with equation ([3]), the theorem also links multipoint evaluation and 
interpolation for polynomials to the same tasks for rational functions (cf. |P011 Chapter 3]). 

Theorem 4. (i) Let p{x) ^ Yl'i^o P^^' ' P = iPi)i=Q ^ ^ = (■s^)"Jo^ and v = ivi)i~Q . Then 
the equations p{si) — Vi hold for i — 0, 1, ... ,n — 1 if and only if T4p = v. (ii) For a rational 
function v{x) — Jt't ^^^^ distinct poles ti, . . . ,tn and for n distinct scalars si, . . . , Sn, write 

s = {si)f^i, t — (ij)"^X' ^ ^ {uj)j=i! V = (ui)"^]^. Then the equations Vi = v{si), i = 1, . . . , n hold 
if and only if Cg.tu = v. (iii) The equation YVi^oi^ ~ ^i) = + v{x), for v{x) — J27=o ^^^^ ^'^'^ 
for n distinct knots to, . . . ,t„_i, is equivalent to the linear .system of n equations, v(ti) — — for 
i — 0, . . . ,n — 1. 

Theorem 5. (i) det{V) = Y[i<ki^i - det(C) = YiiKji^i " ~ ^i)/ Hijl^i ^ ^j); ^"-^ 

the matrices V and C of Tahle\^ are nonsingular where all scalars si, . . . , s„, ti, . . . , t„ are distinct. 

(ii) A row interchange preserves both Vandermonde and Cauchy structures. A column interchange 
preserves Cauchy structure. 

Next we will specify a subclass of Cauchy matrices most closely linked to Vandermonde and 
transposed Vandermonde matrices (cf. Definition [5]). At first write 

V, = iU^'y-Xi-i = ndiogt/'"')?.,, (4) 

for two distinct scalars e and / 7^ 0. Then observe that = Vi = T4 for s = ft^ — Vt for 

t = (w^^*)"^! (so both and fl^are Vandermonde matrices), Csj — Cs,t, Ce,t — Cs,t, C'e,/ — Cs,t 
where s = {eu!^~^)2j^i and/or t — (/w^i~^)f j^i, and the matrices Vf are quasiunitary where |/| — 1. 

Now let t = (/wr^)i=i: obtain t(a;) = x"-/", t{x) = na;"-\ i(sO = sf-/", t'{t,) = nf^-^u^-^ 
for all j, and nV7^ — diag(/^^')"^;^r2^, substitute into ([21), and obtain 



C.,/ = diag [4!^ ) diag(/i-')ILif^^ diag(a;i-^)ILi. (6) 
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If in addition s — {ecul^ then 5^ — for all i and Vs — Vf. Substitute into ^ and obtain 

Definition 6. Hereafter we refer to the niatrices Vf, Csj, Ce,t, and Cej for all scalars e and / as 
FV, FC, CF, and FCF matrices, respectively. We refer to the matrices Cgj and Ce,t as CV matrices 
and to the FV niatrices Vf and the FCF niatrices Cej as the DFT-hased matrices. 

Equations (|4]) and ([7]) link the DFT-based matrices to the DFT matrix Vl. Similarly to this 
matrix they have their basic sets of knots § — {si,...,s„} and T = {ti, . . . ,tn\ equally spaced on 
the unit circle {z : \z\ = 1}. Equations © link the CV matrices to Vandermonde matrices Vs and 
Vt, respectively. Combine equation ^ and (O with Theorem [3] to obtain the following results. 

Theorem 7. 0{n\ogn) flops are sufficient to compute the product Mf of a DFT-based Vander- 
monde or Cauchy n x n matrix M and a vector f . // the matrix M is nonsingular, then 0{n\ogn) 
flops are also sufficient to compute the solution x to a linear system of n equations A/x = f. 

/-circulant matrices. = I ^ ) ^^"^ n x n matrix of /-circular shift for a scalar /, 

JZfJ = Zj, JZjj = Zf (8) 

for any pairs of scalars e and /, and if / 7^ 0, then 

^f^ = ^1//^- (9) 

^/(^) ~ X]"=i '^iZf~^ is an /-circulant matrix, called circulant for / = 1. It is a Toeplitz matrix 
defined by its first column v = {vi)f^i and by a scalar / 7^ 0. It can be called a DFT-based Toeplitz 
matrix in view of the following results. 

Theorem 8. (See \CPW74l ) We have Zi(v) VL-^ D{Q.w)VL. More generally, for any f ^ 0, we 
have Zf,^{-v) = Vf^D{Vfv)Vf where ft = «^)"7io thenxn matrix of DFT, D{u) = diag(w,);"ro^ 
for a vector u = {ui)"~Q and the matrix Vf = £7 diag(/*)"jQ^ of Q). 



The complexity of computations with ToepUtz, Hankel, Cauchy and Vandermonde 
matrices. Theorems [3] and [8] combined support numerically stable computation of the product by 
a vector of an /-circulant matrix Ze(u) (as well as of its inverse if the matrix is nonsingular) by 
using 0(ri log n) flops. We can extend this cost bound to multiplication of a Toeplitz matrix T of 
Table[l]by a vector, by embedding the matrix into a 2*^ x 2*^ circulant matrix for k = [log2(2n— 1)]. 
Each of pre- and post-multiplication by the matrix J, that is the cyclic interchange of rows or 
columns, transforms a Toeplitz matrix into a Hankel matrix and vice versa, and therefore transforms 
accordingly the algorithms for matrix inversion and solving a linear systems of equations. 

Numerically unstable algorithms using nearly linear number of fiops (namely 0(nlog^ n) flops) 
are known for multiplying general Vandermonde and Cauchy nxn matrices by a vector and solving 
Toeplitz, Hankel, Vandermonde and Cauchy nonsingular linear systems of n equations (cf. [POll 
Chapter 2 and 3]). Numerically stable known algorithms for all these problems run in quadratic 
arithmetic time, except that numerically stable algorithms of |MRT05) . |CGS07) and |XXG12] ap- 
proximate the solution of nonsingular Toeplitz linear systems in nearly linear time. We seek extension 
of the latter algorithms to Vandermonde and Cauchy computations. 

3 The structures of Toeplitz, Hankel, Vandermonde and 
Cauchy types. Displacement ranks and generators 

We generalize the four classes of matrices of Table [T] by employing the Sylvester displacements 
AM — MB where the pair of operator matrices A and B is associated with a fixed matrix structure. 
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(See |P011 Theorem 1.3.1] on a simple link to the Stein displacements M — AMB.) The rank, the t- 
rank, generators, and r-generators of the displacement of a matrix M (for a fixed operator matrices 
A and B and tolerance r) are called displacement rank (denoted dA,B{M))^ t- displacement rank, 
displacement generator, and t- displacement generator of the matrix M, respectively (cf. |KKM79"] ). 
[PUT] . jBMOl] ). In our Theorems [TT] and fT3l we write (t). (h), (th), (v), (w^), and (c) to indicate the 
matrix structures of Toeplitz, Hankel, Toeplitz or Hankel, Vandermonde, transposed Vandermonde, 
and Cauchy types, respectively, which we define next. 

Definition 9. If the displacement rank of a matrix is small (in context) for a pair of operator 
matrices associated with Toeplitz, Hankel, Vandermonde, transpose of Vandermonde or Cauchy 
matrices in Theorem [TT] below, then the matrix is said to have the structure of Toeplitz, Hankel, 
Vandermonde, transposed Vandermonde or Cauchy type, respectively. Hereafter T, H, V, , and 
C denote the five classes of these matrices (cf. Table [2]). The classes V, , and C consist of distinct 
subclasses Vg, V^, and Cs,t defined by the vectors s and t and the operator matrices Ds and Dt, 
respectively, or equivalently by the bases T4 and Cs,t of these subclasses. To simplify the notation we 
will sometimes drop the subscripts s and t where they are not important or are defined by context. 

Definition 10. (Cf. Definition [51) In the case where the vectors s and t turn into the vectors 
e((x)*j^^)"^j and /(w^~^)"=i for some scalars e and /, we define the matrix classes = LigVe, 
J-C = UfCsj, CJ- = UeCe,t, and TCJ- — UejCe,/ where the unions are over all complex scalars e 
and /. These matrix classes extend the classes of FV, FC, CF, and FCF matrices, respectively. We 
also define the classes CV (extending the CV matrices) and V^J- = UeVj. We say that they consist 
of FV-\ike, F-^F-like, i^C-like, CF-like, FCF-like, and CT^-like matrices, which have structures of 
J-"V-type, V'^'j^-type, J^C-type, CJ^-type, J^CJ^-type, and CV-type, respectively. 

One can readily verify the following results. 

Theorem 11. Displacements of basic structured matrices. 

(th) For a pair of scalars e and f and two matrices T (Toeplitz) and H (Hankel) of Tablel^ the 
following displacements have ranks at most 2 (see some expressions for the shortest displacement 
generators in W01[ Section 4-2]), 

ZeT - TZf, ZJT - TZj, ZJH - HZf and Z^H - HZj. 

(v) For a scalar e and a Vandermonde matrix V of Table{l\we have 

FZe = Z?,F-(sr-e)r=ie^, (10) 

ZjV^ = V^D.-e,,{{s2-e)Uf. (11) 
Consequently the displacements DsV — VZ^ and ZjV^ ~ V^ Ds have rank at most 1 and vanish if 

si = eioT i^l,...,n. (12) 

(c) For two vectors s = (si)f^x '^"■'^ ^ ~ (^i)"=i having 2n distinct components, a Cauchy matrix 
C of Tai/eQl and the vector e = (1, . . . , 1)^ of dimension n, filled with ones, we have 

DsC - CDt = ee^, rank(L>sC - CDt) = 1. (13) 

The following theorem shows that variation of the scalars e and /, defining the operator matrices 
Zf, and Zf, makes negligible impact on the matrix structure, and so the classes T, H, V, and V"^ do 
not depend on the choice of these scalars. 

Theorem 12. For two scalars e and f and five matrices A, B, C, D, and M we have dc.oiM) — 
dA,B{M) < 1 where either A — C , B ^ Z^, D ~ Zf or A = C, B = Z]: , D = Zj and similarly 
where either B ^ D, A ^ Ze, C ^ Zf or B = D, A ^ Z'[ , C ^ Zj . 
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Table 2: Operator matrices for the seven classes T, U, Vs, V^^, V""^, and Cs,t 



r 


n 




v-1 






Cs,t 




(zLZf) 








(Ds.Z^) 




izLzJ) 


{Ze,Zj) 













Proof. The matrix Zi, — Zc = (6 — c)eie^ has rank at most 1 for any pair of scalars b and c. Therefore 
the matrices {ZbM - MB) - {Z^M - MB) = Z^M - Z^M = {Zb-Zc)M and [AZ - MZb) - {AZ - 
MZc) = -M{Zb - Zc) have ranks at most 1. □ 

Table [5] displays the pairs of operator matrices associated with the matrices of the seven classes 
T, v., Vs, Vg"^, Vj, Vg"^, and Cs,t- Five of these classes are employed in Theorems [TT] and [T^ V^^ 
and Vs""^ denote the classes of the inverses and the transposed inverses of the matrices of the class 
Vs, respectively. We obtain the pairs of their associated operator matrices by interchanging the 
matrices in the pairs of the operator matrices for the classes Vs and Vj, respectively (see equation 
(ITTI) of the next section). 

The following theorem expresses the n? entries of an n x n matrix M through the 2dn entries 
of its displacement generator (i^, G) defined under the operator matrices of Theorem [TT] and Table 
m See some of these and other expressions for various classes of structured matrices through their 
generators in [G094], [POTl Sections 4.4 and 4.5], and |PW03) . 

Theorem 13. Suppose si, . . . , s„, ti, . . . , t„ are 2n distinct scalars, s = {sk)k^i, t = V ~ 

(■^i~^)?k=i' G — { s-tk )'ik=i' ^ '^'^'^ / '^''^ ^"^^ distinct scalars, fi, . . . , f^, gi, . • . , gd are 2d vectors of 
dimension n, Ui, . . . , u„, Vi, . . . , v„ are 2n vectors of dimension d, and F and G are n x d matrices 



/ui 

such that F = 



(gi I ••• I gd). Then 



- (fi I • • • I fd), G = 
(t) (e - f)M = ^e(f,)^/(Jg,) ifZcM - MZf - FG^, e ^ f; 

(e - f)M = E -=1 Zc{M,)^Zf{^,)^ = Ze{M,)Zf(^,)J if Z^M - MZj - FG^ , e ^ f, 

(h) (e - f)M = E ■=! Zc{i,)Zf{^,)J if Z,M ~ MZj = FG^ , e ^ f; 
(e - f)M = ■=! Ze{J{,)Zf{Jg,)^ if Z^M - MZf = FG^ , e ^ f, 

(v) M = diag(^)f^iE^t,diag(f,)l^Ze(Jg,) if D,M - MZ, = FG^ and if ^ e for 

1 \ _ _ ft' 

(IF) diag(^)^^i Y.Ui ^e(Jf,)^^^diag(g,) */ Z^M - MD, = FG^ and if <^ ^ e for 
i^l,...,n; 

(c) M = diag(f, )C diag(g, ) = { ^ if D,M ~ MDt = FG^ . 

Proof. Parts (t) and (h) are taken from 'POIJ Examples 4.4.2 and 4.4.4]. Part (c) is taken from jPOH 
Example 1.4.1]. To prove part (v), combine the equations DtM — MZe = FG^ and Z^Z'^i^ = I (cf. 
dH)) and deduce that M - DtMZ'[^^ ^ -F{Zi/cG)'^. Then obtain from [POTl Example 4.4.6 (part 
b)] that M = ediag(^)f^iE;=idiag(f,)FZi/e(Zi/eg,)^. Substitute eZy,{Zy,g,) = Ze(Jgj)^ 

and obtain the claimed expression of part (u). Next transpose the equation Z^ AI — M Dt — FG^ and 
yield DtM'^-M^Ze = -GF^ . From part {v) obtain = diag(j^)f^i f^J^^ diag(gj)FZe( Jf,). 

Transpose this equation and arrive at part {v^). □ 

By combining the estimates of the previous section for the cost of multiplication by a vector of 
Toeplitz, Hankel, Vandermonde, transpose of Vandermonde and Cauchy matrices with Theorem [T3l 
we obtain the following results. 
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Theorem 14. Given a vector w of a dimension n, one can compute the product Mv by using 
0{dn\ogn) flops for an n x n matrix M in the classes T or H and by using 0{dn\og^ n) flops for 
an n X n matrix M in V, V^, or C. 

Remark 15. By virtue of Theorem [T51 the displacement operators M AM — MB are nonsingular 
provided that e ^ f in parts {t) and {h) and that 7^ e for i = 1, . . . , n in parts (v) and {v'^)- We 
can apply Theorem [T2l to satisfy these assumptions. 

Remark 16. (Cf. part (ii) of Theorem [5]) Parts (v) and (c) of Theorem [13] imply that a row 
interchange preserves the matrix structures of the Vandermonde and Cauchy types, whereas a column 
interchange preserves the matrix structures of the transposed Vandermonde and Cauchy types. 

4 Matrix operations in terms of displacement generators 

We can pairwise multiply and invert structured matrices faster if we express the inputs and the 
intermediate and final results of the computations through short displacement generators rather 
than the matrix entries. Such computations are possible by virtue of the following simple results 
from [POO] and [POTl Section 1.5] (extending [P90] ). 

Theorem 17. Assume five matrices A, B, C , M and N and a pair of scalars a and (3. Then as 
long as the matrix sizes are compatible we have 

A{aM + /3N) ~ {aM + PN)B = a{AM - MB) + /3(AiV - NB), (14) 

A^M'^ - B^M^ = -{BM - MAf, (15) 
A{MN) - {MN)C = {AM - MB)N + M{BN - NC). (16) 
Furthermore for a nonsingular matrix M we have 

AM'^ - M-^B = -M-\BM - MA)M'\ (17) 

Corollary 18. For five matrices A, B, F , G, and M of sizes mxm, nxn, mxd, nxd, and mxm, 
respectively, let us write F = Fa,b{M), G = Ga,b(M), and d = dA,B{M) if AM - MB = FG^. 
Then under the assumptions of Theorem [I^ we have 

FaM^^I + PN) = (aFA^BiM) I l3FA^BiN)), 
GA,B{aM + (3N) = (GaMM) \ Ga,b{N)), 
Fa,b{M^) = -Gbt^at{M^), GaMM^) = Fbt^at{M^), 
Fa,c{MN) = {FaAM) I MFb^c{N)), 
GaMMN) = {N^Ga.b{M) I Gb.c{N)), 

FaAm-^) = -m-^GbAm), GaAm-^) - m-^FbAm). 

Consequently 

dA,B{aM + (3N) < dAMM) + dA.B{N), 

dA,B{M^)^dBT,AT{M), 

dA,c{MN) < dAMM) + dB,ciN), 

dA.B{M-^) = dBAM). 

The corollary and Theorem [13] together reduce the inversion of a nonsingular nxn matrix M 
given by its displacement generator of a length d to solving 2d linear systems of equations with this 
coefficient matrix M, rather than the n linear systems Afx^ — ei, i — 1, . . . ,n for general matrix M. 

Given short displacement generators for the matrices M and N, we can apply Corollary [TH] and 
readily express short displacement generators for the matrices M'^, aM + j3N , and MN through 
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Table 3: Operator matrices for matrix product 



p 


M 


N 


PMN 


c 


A 


B 


C 


A 


B 


D 


D 



the matrices M and N and their displacement generators, but the expressions for the displacement 
generator of the inverse involve the inverse itself. 

Some fast inversion algorithms such as the divide-and-conquer MBA algorithm of jM80) and 
[BA80j involve auxiliary matrices with displacement generators whose lengths exceed the displace- 
ment rank shared by the input and output matrices. If uncontrolled, the divide-and-conquer process 
can blow up the length of the displacement generators and the computational cost (see |POH Chapter 
5]), but one can apply Theorem [T] to recompress the generators and obtain the following result. 

Corollary 19. \M8C^ . \BA80f . The MBA algorithm computes a displacement generator of the 
inverse of a nonsingular n x n matrix M by using 0{(Pn\og^ n) flops where s = 2 if M is from the 
class T or H and s ^ 3 if M is from the class V, V'^ , or C. 

5 Transformation of Matrix Structures 
5.1 Maps and multipliers 

Recall that each of the five matrix classes T, H, V, , and C consists of the matrices M whose 
displacement rank, rank(y4M — MB) is small (in context) for a pair of operator matrices {A, B) 
associated with this class. Thus these pairs represent the structure of the matrix classes. 

Theorem [17] shows the impact of elementaty matrix operations on the associated operator matri- 
ces A and B. For linear combinations, transposes and inverses the original pair (A, B) either stays 
invariant or changes into {—B^, A'^) or {—B, A), respectively. If the inputs are in any of the classes 
T, H, and Cs,t, then so arc the outputs, whereas the transposition maps the classes V and V"^ into 
one another and the inversion maps them into the classes and V~^, respectively. The impact 
of multiplication on matrix structure is quite different. As we can see from (|16p and Table [S] the 
map M — > PMN can imply transition from the associated pair of operator matrices {A, B) to any 
new pair (C, D) of our choice, that is we can transform the matrix structures of the five classes into 
each other at will. The following theorem and Table |4] specify such structure transforms given by 
the maps M MN, N ^ MN, and M PMN for appropriate multipliers P, M, and N. 

Theorem 20. It holds that 

(i) MN ^ T if the pair of matroces {M,N) is in any of the pairs of matrix classes {T,T), 

iH,H), (V-\Vs) and (Vj,V-^), 

(a) MN e n if the pair {M,N) is in any of the pairs {T,n), {n,T), (Vs"\ Vg^^) and {V^,Vs), 
(Hi) MN £ Vs if the pair {M,N) is in any of the pairs (Vs,T), {V~^,H), and (Cs,t,Vt), 

(iv) MN G Vj if the pair (M, N) is in any of the pairs (T, Vj), (H, V''^) and (V^,Cq,s), 

(v) MN G Cs,t if the pair {M,N) is in any of the pairs (Cs,q,Cq^t); O^s'^t'^s) ^'^'^ 0^s,'^s^)r 
(vii) PMN eCs,t if the triple {M,N,P) is in any of the triples {Vs,n,V^) and {V^'^ ,n,V^^). 

The maps of Theorem [20] and Table [4] hold for any choice of the multipliers P and N from the 
indicated classes. To simplify the computation of the products MN, MP and MNP, we can choose 
the multipliers J, Vr, , V'"^ , , and Cp r, all having displacement rank 1, to represent the 
classes H, Vr, Vj, V^^, V^"^, and Cp^r of the table and the theorem, respectively, where p and 
r can stand for q, s, and t. Hereafter we call this choice of multipliers canonical. We call them 
canonical and DFT-hased if up to the factor J they are also DFT-based, that is if p and r are of the 
form /(ll'^^^)"=i. These multipliers are quasiunitary where |/| = 1. By combining CoroUarv 1181 and 
Theorem [20] we obtain the following result. 
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Table 4: Mapping matrix structures by means of multiplication 
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n 






Cs,t 


rr, vjv-^ 




VsT, Cs.tVt 




V-^H^, V,Vt-\ C,,qCq,t 













Corollary 21. Given a displacement generator of a length d for an n x n matrix M of any of the 

classes T, %, V, V"^ , andC, 0{dn\o^ n) flops are sufficient to compute a displacement generator 
of a length at most d + 2 for the matrix PAIN of any other of these classes where P and M are 
from the set of canonical multipliers complemented by the identity matrix. The flop hound decreases 
to 0{dn\ogn) where the canonical multipliers are DFT-hased. 

One can simplify the inversion of structured matrices M of some important classes and the solu- 
tion of linear systems Mx = u by employing preprocessings M — > PMN with appropriate structured 
multipliers P and N. For an example we can decrease the complexity bound of 0{d^n\o^ n) of 
Corollary [T9l to 0{d'^n\o^ n) by applying canonical transformations ([T]) of the matrices of the bot- 
tleneck classes V, and C into the "easier" matrices of the classes T and %. 

5.2 The impact on displacements 

In the canonical maps of Theorem [2D] the displacement ranks grow by at most 2 but possibly less 
than that, as this is implied by Theorems [52] and [22] below. In our constructive proofs of these 
theorems we also specify the multipliers P and N and compute the displacement generators for the 
products PMN of some maps of Theorem [20] In some maps we set P = I or N = I, thus omitting 
one of the multipliers. We show no maps where the matrices M or PMN belong to th classes V"^, 
V"\ or V"^, but they can be generated from the maps with M € V or PMN € V by means of 
transposition and inversion. 

Theorem 22. Given a displacement generator of a length d for a structured matrix M of any of the 
four classes T, T-L, V , and C, one can obtain a displacement generator of a length at most d + 2 for 
a matrix PMN belonging to any other of these classes by selecting appropriate canonical multipliers 
P and N among the matrices I (from the class T), J (from the class %), Vandermonde matrices 
V and their transposes V'^ . Namely, if we assume canonical multipliers P and N, then we can 
compute a displacement generator of the matrix PMN having a length at most d where the map 
M — > PMN is between the matrices M and PMN in the classes % and T . This length bound grows 
to at most d -\- 2 where M is in the class T or %, whereas PMN £ C or vice versa, where M G C 
and PMN is in the class T or %. We yield dispacement generators of at most lengths d + 1 in the 
maps M PMN that support all other transitions among the classes T, %, V and C . 

Proof. We specify some maps M MNP that suport the claims of the theorem. One can vary 
and combine these maps as well as the other maps of Theorem [20] and Table [4] 

(a) T — > "H, PMN = JM . Assume a matrix M G T, a pair of distinct scalars e and /, and 
a pair of n x d matrices F = Fz^,Zf{M) and G = G z^,Zf{M) for d = dz^,Zf{M) satisfying the 
displacement equation Z^M — MZf = FG^ . Pre-multiply this equation by the matrix J to obtain 
JZeM - {JM)Zf = JFG'^. Rewrite the term JZ^M = JZ^JJM as Z'^JM by observing that 
JZeJ = Zl (cf. (0) for / ^ e). Obtain Z^{JM)-[JM)Zf = JFG^ . Consequently FzT^z^iJM) = 
JF, GzT^ZfiJM) ^ G, dzT^ZfiJM) = dz„ZfiM), and JM eU. 

(b) T V, PMN — VM. Keep the assumptions of part (a) and fix n scalars si...,s„. 
Pre-multiply the displacement equation Z^M — MZf — FG^ by the Vandermonde matrix V = 
{4~^)Zj=i to obtain VZeM-{VM)Zf = VFG^. Write s = {s^)f^l and substitute equation ^ to 
yield D^iVM) ~ {VM)Zf = VFG^ + - e)^^,e'^M = FvmG^m for Fvm = {VF \ (sf - e)f^i) 

and = ( \ . So dD,,Zs{VM) < dz^^Zf{M) + 1 and VM e V. 
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(c) H ^ T, PMN — MJ. Assume a matrix M G H, a pair of scalars e and /, and a pair oinxd 
matrices F and G for d = ^t(M) satisfying the displacement equation Z^M — MZj — FG^. 

Post-multiply it by the matrix J to obtain Z^iMJ) - MZjj = FG'^J. Express the term MZjj = 
MJJZjj as MJZf (cf. dl])) to obtain Z^iMJ) - {MJ)Zf = FG^ J = F{JG)^ and consequently 
Fz^,Zf{JM) = F, Gz^,Zf{JM) = JG, dz^,Zf{MJ) = dz^^zjiM) and MJ e T. 

(d) v. ^ V. Compose the maps of parts (c) and (b). 

(e) V — > PMN — V'^M. Assume n + 2 scalars e, /, si, . . . , s„, a matrix M £ V, and 
its displacement generator given by n x d matrices F and G such that DgM — MZj — FG^. 
Pre- multiply this equation by the transposed Vandermonde matrix V"^ = {s^j'^)^j^i to obtain 
V'^DsM - {V^M)Zf = V^FG'^ for s = (si)r=i- Apply equation ([TT]) to express the matrix 
F^Ds and obtain Zj{V^M) - {V^M)Zf = FG^ ~ e„((sf - eYLiV M = FvtmGI-tm for 

FyTM = {V^F I e„) and G^,^ = (((^ _ jt^^) ■ So dz-.z,(^^A^) < dD.,z,(A/) + 1 and 

v^M € n. 

(f ) V T. Compose the maps of parts (e) and (c) . 

(g) V — > C, PMN = MJV'^. Assume 2n + 1 scalars e, si, . . . , s„, ii, . . . , a matrix 
A/ G V, and its displacement generator given by n x d matrices F and G. Post-multiply the 
equation D^M - MZe = FG'^ by the matrix JT^^ where V'^ = is the transposed 
Vandermonde matrix, substutute Z^J ^ JZj, and obtain D^{MJV'^) - MJZjV'^ = FG^ JV'^ 
for s — (si)"=i- Apply equation (fTTj) to express the matrix ^Jt^"^ and obtain Ds(MJF-^) — 
(MJV^)Dt = FG^JV^-MJe,,{{t^-e)^^^Y = -F^Mjy^G^^yT where Fmjv^ = (i^ | AfJe„) and 

GlnvT ^\(( iln ^T ■ So dD^^oAMJV^) < dD^,zSM) + 1 and MJV'^ £ C. 



We can alternatively write PMN = MV^^ for ^ = 14. (The matrix V can be readily inverted 
where it is FFT-based, that is where V = Vf.) Post-multiply the equation DgM — MZe = FG^ 
by the matrix V'^ = V^'^, for t = iU)^^^, to obtain D^MV'^ ~ MZ^V'^ = FG^V~^. Pre- and 
post-multiply by equation PH)) for s replaced by t and obtain Z^V^^ = V^^Dt — V^^{t'2 — 
e)"'.e'IV~'^. Substitute the expression oi Z^V ^ from this equation into above equation and obtain 
D,{MV-^) - {MV-^)Dt = FG^V-^ ~ V-\t^ - e)f^,elV-^ = Fmy-^GIj^^, for Fmv-^ = 

{F I ~ e)f^i) and G^_ij,, = (Jv^-i) ' ^° dD,,D,{VM) < do^^zAM) + 1 and MV~^ £ C. 

(h) C — > V, PMN = AIV. Assume 2n + 1 scalars e, si, . . . , s„, ti, . . . ,tn, a matrix M £ C, and 
its displacement generator given by n x d matrices F and G such that DsM ~ MDt = FG^ for 
s = {si)f^i and t = Post-multiply this equation by the Vandermonde matrix V ~ 

to obtain Ds(MF) — MDtV = FG'^V. Express the matrix DtV from matrix equation (fTO|) and 
obtain i:)s(MV^)-(Aft/)Ze = FG'^V+M(tf-e)''^^^el = FMvGljy where Fmv = (i^ | M(t^-e)f^i) 

and Gl^^y = {^It^ ■ So do^^zAMV) < do^^oAM) + 1 and AfF £ C. 

(i) C ^ T. Compose the maps of parts (h) and (f). 
{]) C ^ H. Compose the maps of parts (h) and (e). 
(k) T ^ C. Compose the maps of parts (b) and (g). 

(i) % ^ C. Compose the maps of parts (d) and (g). □ 

Multiplications by a Cauchy matrix keeps a matrix in any of the classes T, H, V and C, but 
changes a diagonal operator matrix. Next we specify the impact on the displacement. 

Theorem 23. Assume 2n distinct scalars si, . . . , s„, ti, . . . , tn, defining two vectors s = (si)"^]^ and 
t = {tj)j'^i and a nonsingular Cauchy matrix G — Gs.t — ( ^ ■ )?j=i ('^f- pori (i) of TheoremlSj). 
Then for any pair of operator matrices A and B we have 

(i) dA.oAMG) < dA.oAM) + 1 and 

(ti) dD,^B{GM) < dD.AM) + 1. 

Proof, (i) We have dA,DAM) = rank(yiAi" - MD^) = rank(AA4"G - AfD^G). Furthermore AMG - 
MGDt = AMG - MD^G + MD^G - MGDt = {AM - MD^)G + M(D^G - GDt). Substitute 
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equation ^ and deduce that AMC- MCDt = {AM - MD^)C + Mee^ . Therefore dA.DtiMC) = 
rank(AAfC - MCDt) < rank((y4M - MD^)C) + 1 = rank(ylM - MD^) + 1 = cIa.dAM) + 1- 

(ii) We have D^CM-CMB = D^CM -CDtM + CDtM -CMB = {DsC -CDt)M + C{DtM - 
MB). Substitute equation (US]) and deduce that D^CM - CMB = C{DtM ~ MB) + ee^M. 
Therefore do^.BiCM) = rank(i:>sCAf - CMB) < Tank{C{DtM - MB)) + 1 = rank(i:>tM - MB) + 
l = dD„BiM) + l. □ 

5.3 Canonical and DFT-based transformations of the matrices of the 
classes T, T-L, V and into CV-like matrices 

MultipUcation by a Vandermonde muhipher V — (s^ ^)?j=i '^^ tiy its transpose in CoroUarv 1181 
increases the length of a displacement generator by at most 1, but in the proof of Theorem 1221 such 
a multiplication does not increase the length at all where — e for i — 1, . . . , n and for a scalar 
e, employed in the operator matrices and of the Vandermonde displacement map (cf. (|10p 
and (fTTj) '). This suggests choosing the vectors s = {eujl^^Y^Li and t = (/a;^~^)"=i and employing 
the DFT-based multipliers T4 and Vf (cf. ^) wherever we are free to choose these vectors and 
multipliers. In particular we can choose such DFT-based multipliers in our maps supporting part 
(g) of Theorem[211 and then we would output matrices of the class CV having the same displacement 
ranks as the input matrices M. Furthermore the inverse of the matrix V = Vt, employed in our 
second map supporting part (g), would turn into DFT-based matrix V/, and we could invert it and 
multiply it by a vector by using 0{n\ogn) flops (cf. Theorem [7]). We deduce the following results 
by reexamining the proof of Theorem [22] and applying transposition. 

Theorem 24. Some appropriate canonical DFT-based multipliers from the proof of Theorem 
for the basic vectors s = {e^^l^^)f^i and t = (fujl^^)f^i support the following transformations of 
matrix classes (in both directions), T ^ J-V o TCT , T-L O o TCT , V -f-)- CT , V"^ O TC, and 
V U <-> CV. The multipliers are quasiunitary where |e| — \ f\ = 1. 

By combining our second map of the proof of part (g) of Theorem [52] with our map from its 
part (b) and choosing t = (/i^^^^)"=i, we can obtain canonical DFT-based transforms T — >• C = 
ilT diag{f^~^)2^iil^ , which are quasiunitary where |/| = 1. For / = uj2n they turn into the 
celebrated map employed in the papers [H95], |G!k095] . [U98] . |MkT05| . [HOB] . |C6SQ7) . |XXG12) . 
The following theorem shows the implied map of the displacement generators. 

Theorem 25. Suppose ZiM — MZ^i ~ FG^ for an n x n matrix M and n x d matrices F and 
C and write P = ftn, N = uj^^n^^ , C = FMN, D = diag(w2;^)f^i, and D = D^ ^ diag(w^-i)^^i . 
Then DC - u}2nCD = FcG^ for Fc = flnF and Gc = uj2n^nDoG. 

This theorem and the supporting canonical DFT-based map T —?' C are the special cases of 
Theorem [21] and its transforms of matrix structures, but they appeared in [H95] as corollaries 
of Theorem [S] In his letter of 1991, reproduced in [Pill Appendix C], G. Heinig acknowledged 
studying the paper |P90j . but his alternative derivation in |H95) appeared ad hoc and has defined 
a more narrow class of transforms of matrix structures than Theorem [221 extending P90 ■ Heinig's 
specialization of the structure transformation method, however, has paved way to the subsequent 
strong demonstration of the power of the method in |GK095| . p98] . |MRT05| . [R06] . |CGS07| . 
[XXG12| and has specified an efficient quasiunitary map T — >■ C above, which employed the uniform 
distribution of the 2n knots Si, . . . , s„, ti, . . . ,tn on the unit circle {z : \z\ — 1}. 

6 HSS matrices 

The following class of structured matrices extends the class of banded matrices and their inverses. 

Definition 26. Hereafter "_H5'5" stands for "hierarchically semiseparable" . An n x n matrix is 
(Z,it)-HSS if its diagonal blocks consist of 0{{l + u)n) entries, if I is the maximum rank of all its 
subdiagonal blocks, and if u is the maximum rank of all its superdiagonal blocks, that is blocks of 
all sizes lying strictly below or strictly above the block diagonal, respectively. 
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This definition is one of a number of similar definitions of such matrices, also known under 
the names of quasiseparable, weakly, recursively or sequentially semiscparable matrices, as well as 
matrices with low Hankel rank and rank structured matrices. See [VVGM05 , VV M07] . |VVM08) . 
and the bibliography therein on the long history of the study of these matrix classes and see |GR87) , 
[LRT79j ■ |PR93 ' on the related subjects of Multipole and Nested Dissection algorithms. 

A banded matrix B having a lower bandwidth I and an upper bandwidth u is an (Z,m)-HSS 
matrix, and so is its inverse if the matrix B is nonsingular. It is well known that such a banded 
n X n matrix can be multiplied by a vector by using 0{{l + u)n) flops, whereas 0{{l + u)'^n) flops 
are sufficient to solve a nonsingular linear system of n equations with such a coefficient matrix. 
Both properties have been extended to (/, u)-HSS matrices of the size nx n (see [MRTOSj . |CGS07) , 
|XXG12| V Furthermore, like the matrices of the classes T, H, V and C, such an HSS matrix allows 
its compressed representation where one defines its generalized generator that readily expresses its 

entries via 0{{l + u)n) parameters. The inverse of a nonsingular (Z,m)-HSS nx n matrix M is 
also an [I, u)-HSS n x n matrix, and a generator expressing the inverse via 0{(l + u)n) parameters 
can be computed by using 0((l + u)'^n) flops. See XXG12J and references therein on the supporting 
algorithms and their efficient implementation. 



7 Numerical ranks of Cauchy and Vandermonde matrices 

Next we bound numerical rank for a large subclass of the class of Cauchy and Cauchy-like matrices. 
In the next section we extend this study to approximate CV and CV-like matrices by HSS matrices. 

Definition 27. A pair of complex points s and t is {9, c)- separated for 9 < 1 and a complex point 
c if l^zrl I < 9. Two sets of complex numbers § and T are {9, c)-separated from one another if every 
pair of elements s € § and t e T is (0, c)-separated from one another for the same pair {9,c). 
i5c,s = niiusgs |s — c| and S^j = min^gT \t — c| denote the distances of the center c from the sets S 
and T, respectively. 

Lemma 28. iR85^ . Suppose two complex points s and t are {9, c)- separated from one another for 
< 9 < 1 and write q = ■^Ef ? 1 9 1 5: ^- Then for every positive integer k we have 



fe-i 



— E 7^4i + ^ ^l^e^e 1*1 ^ - d). (18) 



[s- cr s - c 

1=0 



Proof ^ = ^ - ^ ESo 9' = I^(Eto <!' + EZ, = I^(Eto <!' + fy- □ 

Corollary 29. (Cf mRTOSf . \CGS07[ Section 2.2].) Suppose C ^ {j-zt-)'i.j=i a Cauchy matrix 
defined by two sets of parameters § = {si,...,s„} and T — {ii,...,t„}. Suppose these sets are 
{9, c) -separated from one another for < 9 < 1 and a scalar c and write 

n 

^ = (^cs = min |sj - c|. (19) 

i— 1 

Then for every positive integer k it is sufficient to use 2kn + An ops to compute two matrices 

F = (l/(., - cf)-:^+l = {{t, - c)XLo (20) 
that support the representation of the matrix C as C — C + E where 

C = FG^, rank(C') <k + l, (21) 

E = (e^,,)"J=l, KjI < (Y^-^ fo'^ aU pairs (22) 
and so \\E\\ < nq^' / {{1 — q)8). 
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Proof. Apply (|18l) ioi s — Si, t — tj and all pairs to deduce (|22l) . 



□ 



Here are three immediate extensions of the theorem and the corollary. 

(i) We can replace S = Scs — mhi"^^ \si — c\ hy S — Sct = min"^^ \tj ~ c\ because Cj^. = — Ct s 
(cf. 0). ' 

(ii) By virtue of part (c) of Theorem [T51 we can extend the bounds of Corollary [23] from a Cauchy 
matrix Ca,t to a Cauchy- like matrix of the class Cs,t, given with a displacement generator {F, G) of a 
length d. In this extension the rank bound (|21|) increases by a factor of d and the error norm bound 
([221) increases by a factor of d \\F\\ \\G\\. 

(iii) Already for moderately large integers k the upper bounds of ([22]) are small unless the 
values 1 — 9 > Q and 5 of are small. Then Corollary [51] implies an upper bound fc + 1 on 
the numerical rank of the large subclass of Cauchy matrices C = ( ^.1^ . whose parameter 
sets S — {si, . . . , s„} and T = {ti, . . . , t„} are (9, c)-separated from one another for an appropriate 
center c. If this property holds for two subsets of the sets § and T that define an n x ? or an ^ x n 
Cauchy submatrix where I > fc + 1, then the I rows or columns of this submatrix form a nearly rank 
deficient matrix, which means that the matrix C is ill conditioned. Apply a canonical DFT-based 
quasiunitary map V — >■ CV that supports part (g) of Theorem [22] (see Theorem [24]) and deduce that 
a Vandermonde matrix Vt is ill conditioned unless its knots from the set T = {^i, . . . , t„} are close 
enough to all or almost all knots of the set of the nth roots of 1, scaled by a scalar e, 
|e| = 1. This implies (cf. jGI88] ) that except for a narrow subclass all Vandermonde matrices are ill 
conditioned. 



8 HSS approximation of CV and CV-like matrices 

Theorem 30. Assume positive integers g, h and n, a scalar e, and a Cauchy matrix C = Cg.e = 
i s -t '5^'^^ ^^"^^ — ^i^n^^) /"^ j — ^, ■ ■ ■ ,n (cf. Section \2.2\) . gh — n, n is not small, and 

\e\ = 1. Then there is a permutation n x n matrix P such that CP is a 3 x g block matrix with block 
columns {Cj_ \ Ej | Cj_^)'^ , j = 0, . . . , g — 1, where the diagonal blocks Sj have sizes Uj x h, and the 
rows of the blocks Sj and S^. lie in pairwise distinct sets of rows of the matrix CP unless \ j — fc| < 1 
or \j — k\ = g — 1 (and so the blocks Ei, . . . , Eg together have at most 3hn entries), whereas every 
matrix {Cj_ \ Cj_^_)'^ is an h x in — nj) Cauchy matrix defined by the sets of parameters that are 
{1/2, Cj)- separated from one another for some scalars Cj lying on the unit circle {z : \z\ = 1} and 
at the distance of at least 0.5h/n^ from the set Sj. 

Proof. Represent the knots si, . . . , s„ of the set § in polar coordinates, Si = ri exp(27r(/)iV~l) where 
ri > Q, Q < (f>i < 27r, 0^ = if n = 0, and j = 0, 1, . . . , n— 1. Re-enumerate all values (fn to have them 
in nonincreasing order and to have i^q""^'' — min,"^g 0^ and let P denote the permutation matrix that 
defines this re-enumeration. To simplify our notation assume that already the original enumeration 
has these properties and that e = 1. Let §j = {sj}j G § and = G T denote the sets 

of knots lying in the semi-open sectors of the complex plane bounded by the pairs of rays from the 
origin to the points w^^'* and ujn^^''^ , respectively. Namely denote by Sj and Tj the subsets of the 
sets § and T made up of the knots whose arguments (pj satisfy 2Trjh/n < (j>j < 27r{j{h + 1) — !)/«, 
j=0,...,5-l. 

Write a{a, b) to denote the arc of the unit circle {z : \z\ = 1} with the end points a and b. For 
every j,j — 1, 5, choose a center Cj on the arc ^(w^^-'^^^'*, cj^^^-'^'^'''). This arc has the length 7r/i/n 

and shares the midpoint uj^J'^^^'^ with the arc a{uj^'^ ,uJn'^^^'^), having the length 27r/i/n. Choose 
the center cj at the distance at least 2h/n'^ from the set § (as we required). This is possible because 
the set has exactly n elements. For j — 0, . . . , g — 1, index by jh, . . . , j(ft. + 1) — 1 the columns shared 
by the blocks Cj.-, Ej and Cj.+ and index the rows of the blocks Ej by the indices of the elements 
of the set §j_i U §j U Sj+i. Note that the sets §>j and Tk — Wn}\kli)h Cj)-separated from 

one another unless |j — fc| < 1 or |j — A;| = g — 1, and this implies the separation property claimed 
in the theorem. □ 
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Apply Corollary [551 foi" 9 1/2, S ~ Q.bh/n?, C = (C„._ | C„^+)-^, and u = 1, . . . , g and obtain 
the following corollary. 

Corollary 31. The matrix PC of Theorem \30\ can be represented as 

PC ^T. + d + E (23) 

where S is the block diagonal matrix diag(I]„)^^-^, rank(C) < (fc + l)g, E — {eij)^j^i, ^ 
/or a/^ pairs and so \ \E\ \ < /h. 

Remark 32. Theorem [301 and the corollary can be immediately extended to the case where h 
does not divide n (in this case write g = \n/h'\) as well as to the case where C = for 

Si = ew'~^ for all i and |e| = 1 (because Cs,e — —C^s (^f- ©))■ Theorem [T^ implies an extension 
to the matrices M of the class CV, with the increase of the rank bound by a factor of d and with the 
increase of the approximation norm bound by a factor of d | |i^| | | |G| | provided the matrix M is given 
with its displacement generator {F, G) of a length d. The proof technique of Theorem [30] enables 
various further extensions. Clearly one can allow any variation of the set T as long as its elements 
can be partitioned into /i-tuples, each lying on or near the arc of the unit circle {z : \z\ = 1} with 
the endpoints ujH'' and ujn^^^'^ . Furthermore the proof can be readily extended to the case where 
a line interval of a length between 1 and 2 (say) lying on the complex plane not very far from the 
origin (or on an approximation of such a line interval by a segment of a curve) replaces the unit 
circle {z : \z\ = 1} and where the set T can be partitioned into /i-tuples that are more or less equally 
spaced on this interval (or the segment). 

The block diagonal matrix E has at most 3hn entries. The matrix C consists of the off-diagonal 
blocks. B y combin ing Theorem [30| and Corollary [31] with the HSS techniques of |GR87) . [MRT05) . 
[CGS07| . [XXG12| . deduce that for a positive constant b and the integer k = [3(6 + 2) log2 n] , the 
matrix C of (|55)) is an (/, u)-HSS matrix where l + u < ckh, h < c' logn, n^2'^^^ /h < 2~^, and c and 
c' are two constants. 

9 Multiplication of the matrices of the classes CV, V, V^, and 
C and their inverses by vectors 

Suppose fJ-{M) denotes the minimum number of flops sufficient for multiplying a matrix M by a 
vector and estimate n{C) = fi{PC) for the matrices of Corollary [HU The matrix S has at most 
3/in nonzero entries, and so /^(S) < 6hn— n. Furthermore fJ.{C) = 0{n\ogn) because the matrix C 
has the (/, u)-HSS structure for I + u < ckh and h < c' logn (see Section [6]). Let us summarize the 
estimates for the CV matrices with an extension to the matrices of the classes CV, V, and V'^ . 

Theorem 33. (See Remark \36l ) Assume a positive scalar b, a complex e such that \e\ — 1, and 
two vectors f and s of dimension n. (i) Then one can approximate the product Mf within the error 
norm bound ||Af|| ||f|| by using 0{bn\ogn) flops provided that M is a CV, Vandermonde or 
transposed Vandermonde nx n matrix Cs.e, Ce,s, Vg or V^ , respectively, (ii) The flop bound for 
solving a nonsingular linear system of n equations with the coefficient matrix in the above classes 
increases versus part (i) by a factor of logn and the error norm bounds increases by a factor of 
||M~"'^||/||M||. (Hi) The flop bounds of parts (i) and (ii) also hold for approximate evaluation of a 
polynomial of degree n ~ I at n points and for approximate interpolation to this polynomial from its 
n values, respectively, (iv) The flop bounds of parts (i) and (ii) increase by a factor of d, whereas 
the error norm bounds increase by a factor of d \\F\\ \\G\\ where M is a matrix from the class Cg.e, 
Ce,s, Vs or (having the structure of CV, Vandermonde or transposed Vandermonde type) given 
with a displacement generator {F, G) of a length d. 

Proof. Summarize our estimates above to deduce the bound of part (i) in the case of CV matrices 
Cs,e and Ce s- Apply Theorem [2] to estimate the approximation errors of solving the linear systems 
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of equations and extend the bounds of part (i) to part (ii). To extend the estimates of parts (i) 
and (ii) to the case of Vandernionde matrices T4 , apply the canonical DFT-based specialization of 
a map supporting part (g) of Theorem for the DTF-based matrix V — Vf where |/| = 1, and 
so 11^/ 1 1 = v^- The map increases the approximation error norm (versus the case of CV matrices 
Cs,e and Ce.s) by a factor of ^mm"^i js^^^TT' ^^'^'^^^ ^ complex /, |/| = 1, that keeps this factor 
below Sny^. Compensate for this increase of the norm bound by adding log2(3n-\/n) to the value 
k. Similarly multiply a transposed Vandermonde matrix by a vector, transpose a map that supports 
part (g) of Theorem [22l and employ equation ([2]). Extend the results of parts (i) and (ii) to part (iii) 
by applying Theorem (4] Extend them to part (iv) by applying parts (v), {v'^), and (c) Theorem 1 131 
choosing a scalar e in parts {v) and {v'^) such that min"^]^ |e — s"| > 1/2 (say), and increasing the 
integer parameter k by [log(en)] (to compensate for the exceess of the norms ||Ze(fi)|| and ||Ze(gi)|| 
above ||F|| and ||G||, respectively). □ 

Remark 34. One can ignore the HSS structure of the matrix C and still approximate the matrix 
product Cs,ef at the cost bounds that are smaller than the known bounds by a factor of y^n/ logn. 
Indeed choose h of about y/ n log n and choose g of about y/n/ log n in Corollary [3l] and obtain the 
matrix C of a rank of order -^/n log n. We can multiply this matrix by a vector by using 0{n^/n logn) 
flops. The estimate is extended to the overall cost of multiplying the matrix E+C by a vector because 
we can multiply the matrix S by a vector by using 6hn — n flops and because h = 0(\/nlog n). 

Remark 35. We can extend Theorem [551 similar Iv to the extensions of Theorem 1501 and Corollary 
15T] in the second part of Remark 15^ 

Remark 36. The algorithms supporting Theorem l33l can be naturally partitioned into two stages. 
At first we apply canonical DFT-based transformations of Theorems l22l - l24l and [33| and Corollary 
l29l to reduce our tasks to computations with HSS matrices. At this stage we propose a novel 
specialization of the approach of |P90j . Then it remains to apply the Multipole algorithms, which is 
both powerful and well developed. We perform the former (FFT-based) stage by applying 0{n log n) 
flops. The latter (Multipole/HSS) stage involves 0{{l + u)n) flops for multiplication of an n x n HSS 
matrix by a vector and 0{{l + u)'^n) flops for solving a nonsingular HSS linear system of n equations, 
and we have the bound I + u = O(logn) in our case. Empirically, however, in the extensive tests 
in |XXG12) for HSS computations similar to ours, the value (l + u)^ grew much slower than logn 
as n grew large, and so we can expect that the computational cost at the flrst (FFT) stage of the 
algorithms actually dominates their overall computational cost. 

Clearly, the algorithms supporting Theorem[33|are efficient not only for CV and CV-like matrices, 
but for a larger subclass of the class of Cauchy-like matrices (cf. Remark [32|). The extension to the 
general Cauchy and Cauchy-like matrices can lead to numerical problems, however. Here are some 
sketchy comments. Suppose that s, t and u denote three vectors of dimension n and that an n x n 
Cauchy-like matrix M S Cs,t is given with a displacement generator {F, G) of a length d. Then for 
a large class of vectors s and t , one can extend Theorem [30| and reduce the approximation of the 
vectors Mu and of the solution x to a linear system of n equations Afx = u (if it is nonsingular) to 
HSS computations (cf. Remark 15^ . 

Furthermore for all input vectors s, t and u, we can apply our techniques of transforming matrix 
structures to reduce the solution x of the linear system Mx = u to some computations with CV 
matrices and to the computation of the product of the matrix M by the vector e as follows. Fix 
a scalar e, write P = MCt,e and x = Ct.eY, and note that Py =u, whereas P E Cs.e is a CV 
matrix with the displacement generator {Fp, Gp) oi length at most d+1 where Fp — {F \ Me) and 
Gp — {C^^G I e). By applying these techniques to the matrix e Ct,s we can alternatively reduce 
the linear system A/x = u to the computation of the products M'^e and to some computations with 
CV matrices. In both cases application of the algorithms would require additional error analysis. 
E.g., the approximation errors of computing the matrix P would magnify the approximation errors 
for the vectors y (cf. Theorem [2|) and x. 

Next we consider another extension of our techniques and make further comments on error 
propagation. Part (c) of Theorem 1T51 enables us to reduce the approximation of the vector x — Mu 
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to the approximation of the d vectors Cs.tv^ for = diag(gi),f^]^u, g; = Ge^, and i = l,...,d, 
and to 0{n) additional flops, provided the matrix M £ Cs,t is given with its displacement generator 
{F, G) of a length d. (For d — 1 and {F, G) = (1, 1) we arrive at the problems of rational multipoint 
evaluation and interpolation (see part (ii) of Theorem 2]).) Equation ([3]) reduces multiplication Gg.tv 
to multiplication of each of the matrices Vg and T/^"^ by d + 1 vectors, to one multiplication of the 
matrix Vt by a vector, and to 0{n) additional flops (cf. Theorem |4]). We can apply the new fast 
algorithms to approximate the 2d+3 matrix-by- vector products above, but the approximation errors 
can readily propagate in this application of the algorithms. 

10 Conclusions 

At first we revisited our approach of |P90j to the transformation of matrix structures, covered it 
comprehensively, and simplified its presentation by employing the Sylvester (rather than Stein) 
displacements and the techniques for operating with them from jPOO] and [POll Section 1.5]. Then 
we singled out a large subclass of Cauchy-like matrices, which we call the CV-like matrices. We 
closely approximated these matrices by HSS matrices and then applied the Multipole method to 
the latter HSS matrices. This yielded dramatic acceleration of the known numerical algorithms that 
approximated the products of CV and CV-like matrices by vectors and the solution of nonsingular 
linear systems of equations with CV and CV-like coefficient matrices. Namely the running time of 
the new algorithms is nearly linear, versus quadratic time required by the known algorithms. By 
properly transforming matrix structures we have readily extended such an acceleration of the known 
algorithms to the matrices having structures of Vandermonde and transposed Vandermonde types, 
and consequently to numerical multipoint evaluation and interpolation of polynomials. 

Potential extensions and specializations include computations with confluent Vandermonde ma- 
trices, Loewner matrices, and various problems of rational interpolation such as the Nevanlinna-Pick 
and matrix Nehari problems (cf. |P011 Chapter 3] and the bibliography therein), where, however, 
the progress can be limited to the case of sufficiently well conditioned inputs. Our demonstration 
of the power of the transformation of matrix structures should motivate research efforts for finding 
new inexpensive transforms of matrix structures and their new algorithmic applications. Natural 
topics of further study should include the following issues: 

(i) extension of our our approach to a larger class of Cauchy and Cauchy-like matrices (cf. 
Remark [321), 

(ii) the impact of the conditioning of the input on the output errors and the running time, 

(iii) the estimation of the treshold input sizes for which the proposed algorithms running in nearly 
linear time outperform their variant of Remark and the known algorithms, running in quadratic 
time, and 

(iv) implementation of the proposed algorithms. 

The implementation should be mostly reduced to the application of the Multipole algorithms 
and should extend [XXG12] , but the actual work should prompt the refinements toward decreasing 
the treshold values of part (iii). 
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